WTEN: An Advanced Coupled Tensor Factorization Strategy for Learning from Imbalanced Data

نویسندگان

  • Quan Do
  • Thanh Pham
  • Wei Liu
  • Kotagiri Ramamohanarao
چکیده

Learning from imbalanced and sparse data in multi-mode and high-dimensional tensor formats efficiently is a significant problem in data mining research. On one hand, Coupled Tensor Factorization (CTF) has become one of the most popular methods for joint analysis of heterogeneous sparse data generated from different sources. On the other hand, techniques such as sampling, cost-sensitive learning, etc. have been applied to many supervised learning models to handle imbalanced data. This research focuses on studying the effectiveness of combining advantages of both CTF and imbalanced data learning techniques for missing entry prediction, especially for entries with rare class labels. Importantly, we have also investigated the implication of joint analysis of the main tensor and extra information. One of our major goals is to design a robust weighting strategy for CTF to be able to not only effectively recover missing entries but also perform well when the entries are associated with imbalanced labels. Experiments on both real and synthetic datasets show that our approach outperforms existing CTF algorithms on imbalanced data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data

In this paper, novel cost-sensitive principal component analysis (CSPCA) and cost-sensitive non-negative matrix factorization (CSNMF) methods are proposed for handling the problem of feature extraction from imbalanced data. The presence of highly imbalanced data misleads existing feature extraction techniques to produce biased features, which results in poor classi cation performance especially...

متن کامل

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

Complete pivoting strategy for the $IUL$ preconditioner obtained from Backward Factored APproximate INVerse process

‎In this paper‎, ‎we use a complete pivoting strategy to compute the IUL preconditioner obtained as the by-product of the Backward Factored APproximate INVerse process‎. ‎This pivoting is based on the complete pivoting strategy of the Backward IJK version of Gaussian Elimination process‎. ‎There is a parameter $alpha$ to control the complete pivoting process‎. ‎We have studied the effect of dif...

متن کامل

A social recommender system based on matrix factorization considering dynamics of user preferences

With the expansion of social networks, the use of recommender systems in these networks has attracted considerable attention. Recommender systems have become an important tool for alleviating the information that overload problem of users by providing personalized recommendations to a user who might like based on past preferences or observed behavior about one or various items. In these systems...

متن کامل

Liver CT Annotation via Generalized Coupled Tensor Factorization

This study deals with the missing answers prediction problem. We address this problem using coupled analysis of ImageCLEF2014 dataset by representing it as a heterogeneous data, i.e., dataset in the form of matrices. We propose to use an approach based on probabilistic interpretation of tensor factorization models, i.e., Generalized Coupled Tensor Factorization, which can simultaneously fit a l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016